cs539inrfandomcom-20200213-history
C4.5 to FOIL Format
Actually I found FOIL provides a c function to convert a C4.5 format to FOIL. C4.5 Format Dataset in C4.5 are composed of two parts: .data file and .names file. Actually this format is used in Quinlan's C4.5 decision tree. I guess this is why it is called C4.5 format and why Quinlan will provide a function for us to automatically convert it to FOIL A more detailed introduction for C4.5 format can be found http://www.cs.washington.edu/dm/vfml/appendixes/c45.htm Compiling the code cd FOIL #FOIL is where you have all codes extracted from the shell archive gcc c4tofoil.c -o cf #compile the code and name the excutable file as cf ./cf -f crx #specify the file you want to convert. crx.data and crx.names should exist in current folder. crx.test is an optional one. #the output will be stored in crx.d Notes: In the header of c4tofoil.c, there is some comments about how the *.data and *.names should be formatted. Just make sure you follow that. For convenience, I just pasted here /*****************************************************************************/ /* */ /* Program to convert files from the standard C4.5 input format to a form */ /* that can be used by FOIL */ /* */ /* The relation to be found by FOIL will be of the form - */ /* is first class named in the .names file */ /* */ /* Hence changing the order of the class names will cause FOIL to find other */ /* relations from the same data */ /* */ /* Compilation and use: */ /* cc -o cf c4tofoil.c (produce executable cf) */ /* cf -f filestem (take filestem.names and filestem.data (and filestem.test */ /* if present) and produce filestem.d for FOIL) */ /* option -v produces some extra output on the standard output */ /* */ /* (Any error messages are currently printed on the standard output stream) */ /* */ /* Modification required to filestem.names: */ /* Each line containing attribute information should have information */ /* specifying the type - this is added as a C4.5 comment thus... */ /* */ /* (attribute info for C4.5) | type: typename */ /* */ /* where typename is the name of the type of this attribute. Each typename */ /* shall start with a letter (upper or lower case) and no typename shall be */ /* the prefix of another typename. */ /* (The latter restriction is required as the output for FOIL distinguishes */ /* between constants of different types by prefixing them with their */ /* typename). */ /* */ /* For example: */ /* aardvarkish: true, false. | type: Boolean */ /* */ /* Note that values of discrete attributes which occur in the data file */ /* become theory constants for FOIL. (However those that occur in the test */ /* file but not data file, are just constants, not theory constants). */ /* */ /*****************************************************************************/ BTW, in my experiment, the change of line '\n' is something you should pay extra attention to. Make sure it is unix formatted and sometimes deleting empty lines can make the conversion work.